"In vivo" spam filtering: A challenge problem for data mining

نویسنده

  • Tom Fawcett
چکیده

Spam, also known as Unsolicited Commercial Email (UCE), is the bane of email communication. Many data mining researchers have addressed the problem of detecting spam, generally by treating it as a static text classification problem. True in vivo spam filtering has characteristics that make it a rich and challenging domain for data mining. Indeed, real-world datasets with these characteristics are typically difficult to acquire and to share. This paper demonstrates some of these characteristics and argues that researchers should pursue in vivo spam filtering as an accessible domain for investigating them.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ارائه روشی مناسب برای دسته بندی نامه های الکترونیکی تبلیغاتی بر مبنای پروفایل کاربران

In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising emails become spam for some users, and not spam for others. To deal with this problem, many researchers design an anti-s...

متن کامل

Towards Symbiotic Spam E-mail Filtering

This position paper discusses the use of symbiotic filtering, a novel distributed data mining approach that combines contentbased and collaborative filtering for spam detection.

متن کامل

Detection and Filtering Spam using Feature Selection and Learning Machine Methods

In recent years, email has turned out to be among the pervasive and cost-effective tools of communication. In the meantime, spam emails have reduced their popularity and become offensive to all individuals and users applying this capability. Email filtering is the first solution to cope with this challenge. This is developed as a special type of text classification. A variety of methods includi...

متن کامل

Classifying Unsolicited Bulk Email (UBE) using Python Machine Learning Techniques

Email has become one of the fastest and most economical forms of communication. However, the increase of email users has resulted in the dramatic increase of spam emails during the past few years. As spammers always try to find a way to evade existing filters, new filters need to be developed to catch spam. Generally, the main tool for email filtering is based on text classification. A classifi...

متن کامل

Improved Sequential Pattern Mining Using an Extended Bitmap Representation

The main challenge of mining sequential patterns is the high processing cost of support counting for large amount of candidate patterns. For solving this problem, SPAM algorithm was proposed in SIGKDD’2002, which utilized a depth-first traversal on the search space combined with a vertical bitmap representation to provide efficient support counting. According to its experimental results, SPAM o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.AI/0405007  شماره 

صفحات  -

تاریخ انتشار 2003